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Abstract. The Propagation-Separation approach is an iterative procedure for pointwise esti- 
mation of local constant and local polynomial functions. The estimator is defined as a weighted 
mean of the observations with data-driven weights. Within homogeneous regions it ensures 
a similar behavior as non-adaptive smoothing (propagation), while avoiding smoothing among 
distinct regions (separation). In order to enable a proof of stability of estimates, the authors of 
the original study introduced an additional memory step aggregating the estimators of the suc- 
cessive iteration steps. Here, we study theoretical properties of the simplified algorithm, where 
the memory step is omitted. In particular, we introduce a new strategy for the choice of the 
adaptation parameter yielding propagation and stability for local constant functions with sharp 
discontinuities. 



1. Introduction 



The Propagation-Separation approach [ Polzehl and Spokoin"yl |2006| is an adaptive method for 



nonparametric estimation. This iterative procedure relates to Lepski's method fLepsklTJ |1990| 



[Mathe and Perever zev' '2006] and extends the Adaptive Weights Smoothing (AWS) procedure 
from [Pol zehl and Spokoiny | 2000] . The Propagation-Separation approach supposes a local 
parametric model. It is especially powerful in case of large homogeneous regions and sharp 
discontinuities. However, it can be extended to local linear or local polynomial parameter func- 
tions, as well. Hence, the method is applicable to a broad class of nonparametric models. In our 
study, we concentrate on the local constant model for the sake of simplicity. Important applica- 
tion can be found in image processing, where the local constant model is often satisfied. 

In this study, we aim to provide a better understanding of the procedure and its properties. 
The crucial point of the algorithm is the choice of the adaptation bandwidth. We present a new 
formulation of what is known as propagation condition ensuring an appropriate choice. This 
allows the verification of propagation and stability of estimates for local constant parameter 
functions with sharp discontinuities. 



In comparison to the study of [Polzehl and Spokoiny] 12006] , there are two important differences 



which we want to emphasize. First, we avoid the problematic Assumption SO on which the theo- 
retical results in [Polzehl and Spokoiny) [2006] were partially based. Further, we omit the memory 



step which was included into the algorithm to enable a theoretical study. In each iteration step, 
the new estimate is compared with the estimate from the previous iteration step. In case of a sig- 
nificant difference the new estimate is replaced by a value between the two estimates, providing 
a smooth transition, that is relaxation. This is related to the work of [Belomestny and Spokoiny 



(2007] about spatial aggregation of local likelihood estimates The theoretical results in (Polzehl 



[arTd Spokoiny[ [2006 ] are mainly based on the memory step. However, we show for piecewise 
constant functions that the adaptivity of the method yields similar results even if the memory 
step is removed from the algorithm. This gains importance as it turned out, that for practical use 
the memory step is questionable. Therefore, in later application of the algorithm, the memory 
step had been omitted, see e. g. [Becker et al.[(20T2) , p et al.[(20T2l [20^1 l , |Tabe low et al.[(2008l , 



Divine et ar] (2008| still yielding the desired behavior in practice. This article aims to justify the 
simplified Propagation-Separation algorithm, where the memory step is removed. 

The outline is as follows. After a short introduction of the model and the estimation procedure 
we introduce a new parameter choice strategy for the adaptation bandwidth. Then, we consider 
some numerical examples that illustrate the general behavior of the algorithm. The main prop- 
erties, that is propagation, separation and stability of estimates, will be verified in Section [3]for 
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piecewise constant parameter functions with sharp discontinuities. In Section |4] we justify our 
new choice of the adaptation bandwidth by analyzing its dependence of the unknown parame- 
ter function and by discussing some further questions concerning its application in practice. We 
finish with a generalization of the setting of our study. 

We use two results from 'Polzeh l and Spokoiny| [20061 which do not base on Assumption SO. 
These are given in Appendix|A] In order to avoid confusion we refer to them by (PS[l} and (PS|2). 



2. Model AND METHODOLOGY 

In this section we briefly introduce the setting of our study and the estimation procedure result- 
ing from the Propagation-Separation approach. The behavior of the algorithm depends on the 
adaptation bandwidth, and here we introduce a new strategy for its choice. 

2.1. Model. We consider a local parametric model. 

Notation 2.1 (Setting). Let Zi, , Z„ be independent random variables with Zi = (Xj, Yj) e 

X xy. Here, the metric space X denotes the design space and 3^ C M the observation space. 
The observations Yi are assumed to follow the distribution fe{Xi) ^ 'P, where V denotes some 
parametric family of probability distributions and ^ : A" — )■ 6 C M is the parameter function 
that we aim to estimate. We suppose the design {Xi}^^^ to be known. 

Typical examples of this general setting are Gaussian regression or the inhomogeneous Ber- 
noulli, Exponential, and Poisson models, see [Polzehl and SpokoTriy| |2006[ Section 2] for a 



detailed description. In general, the procedure may work for any vector space 3^ C M with 



Yi ~ ^e(Xi), 9 : X Q C M, where M is a metric space. Following Polzehl and Spokoiny 
[20061 we suppose the parametric family to be an exponential family with standard regularity 
conditions. This allows an explicit expression of the Kullback-Leibler divergence simplifying our 
following analysis. 

Assumption A1 (Local exponential family model). V = (Pe,d G 0) is an exponential family 
with a compact and convex parameter set 6 and non-decreasing functions C,B e C'^ (6, M) 
such that 

p{y, 9) := dFe/dFiy) = p{y) exp [T{y)C{9) - B{9)] , 9 e Q, 

where p{y) is some non-negative function on 3^, T : 3^ — )• M, and B'{9) = 9 C'{9). For the 
parameter 9 it holds 

(2.1) jp{y,9)ndy) = l and Ee[nY)] = j T{y)p{y,9)ndy) = 9. 
Remark 2.2. 



In (Polzehl and Spokoiny| |2006', Assumption (A1)], the authors assumed T{y) = y, i.e. 
the identity map. Any invertible transformation T leaves the Kullback-Leibler divergence 
unchanged. Since the results (PS[1} and (PS|2), see Appendix|A] depend on the Kullback- 
Leibler divergence only, they remain valid for invertible maps T. In this study, we consider 
the general case explicitly in order to clarify, where this transformation T comes into play. 
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■ Equation |2.1[ , i.e. Eg [T(F)] = 9, can be achieved via reparametrization with 9 : = 
t{'d), where t{§) := E^ [^'(1^)]. However, this leads to estimation of 9 instead of d such 
that the theoretical properties in Section [3] do not apply for d. This will be discussed in 
Section 1131 

■ A list of parametric families satisfying Assumption ( [AT) , probably after reparametrization, 
is given in AppendixjB] 

■ We suppose Assumption |AT| throughout this article while all later Assumptions will be 
required for specific results only. 

In our subsequent analysis the notions of the Kullback-Leibler divergence, given here as 

/C£(Pe,P,0 := j In (^y) ^o{dy), 9,9' G 9, 



and the Fisher information 

I{9) := -E 

will be important. 



q2 

^\ogp{y,9) 



9eQ, 



Lemma 2.3 (Fisher information and Kullback-Leibler divergence). Under Assumption we 
have that I (9) = C'{9), 9 eQ. Moreover, the following holds. 

■ For every constant x > 1 there is a compact and convex subset 6^ C 6 such that 

(2.2) 77^ <^'' ^1,^2 6 6^. 

■ The Kullback-Leibler divergence is convex w.r.t. the first argument. It satisfies 

(2.3) /C£ (P,, PeO = 9 [0(9) - C{9')] - [B{9) - B{9')] ^ 1(9) [9 - 9'f /2. 



Proof sketch. The first assertion follows with B'{9) = 9C'{9). Then, Equation |2.2| holds due 
to the compactness of and C G C^(6,M). The convexity is satisfied since the second 
derivative of the Kullback-Leibler divergence is non-negative 

£,]CCiFg,Fo') = C'i9)>0. 

The Taylor expansions of B and C yield for the Kullback-Leibler divergence 

/C£ (P,, PeO ^ [-9C"{9) + B"{9)] {9 - 9'f/2 = C'{9){9 - 9'f/2, 
where 9, 9' eQ. □ 

The set should be sufficiently large such that 9{X.i) G Q-^ holds for all i G {1, ...,n}. 
Later on, we require that even the corresponding estimators are elements of Q^, see Assump- 



tion dA2}. In Remark 3.2 we discuss how this can be achieved without increasing x overly 



2.2. Methodology of the Propagation-Separation approach. The algorithm is iterative, and 
in each iteration step the pointwise estimator of the parameter function is defined as a weighted 
mean of the observations. In each design point the weights are chosen adaptively as product 
of two kernel functions. The location kernel acts on the design space X, and the adaptation 
kernel compares the pointwise parameter estimates of the previous iteration step in terms of 
the Kullback-Leibler divergence. For each of the two kernels, a bandwith controls how much 
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information is taken into account. The location bandwidth increases along the number of itera- 
tions. Starting at a small vicinity, in each iteration step the considered region is extended. The 
increasing number of included observations enables a monotone variance reduction during it- 
eration, while the adaptation kernel leads to a decreasing or (in case of model misspecification) 
bounded estimation bias. It will be clear from the subsequent analysis that, by doing so, one ob- 
tains similar results as non-adaptive smoothing within homogeneity regions (propagation) and 
avoids smoothing across structural borders (separation). 

We turn to a formal description, and we start with introducing some notation. 
Notation 2.4. 

■ A denotes a metric on X; 

m ]CC{9, 6') ■= ]CC{Fe, Fg>) is the Kullback-Leibler divergence of Fg and Fg,, 9, 9' e 6; 

■ -fTioc, -^ad : IR"*" — ^ [0, 1] are non-increasing kernels with compact support [0, 1] and 
K.{0) = 1, where Kioc denotes the location and i^ad the adaptation kernel; 

■ {h^'^m*^Q is an increasing sequence of bandwidths for the location kernel with h^'^^ > 0; 

■ A > is the bandwidth of the adaptation kernel; 

■ Ul''^ := {Xj e X : A{X,,Xj) < /i^}. 

— (fc) 

For comparison and the initialization of the algorithm we define the non-adaptive estimator 9- . 

Definition 2.5 (Non-adaptive estimator). Let i G {l,...,n} and k G {0,...,A;*}. The non- 

— (fe) 

adaptive estimator 9^ of 9i is defined by 

n 

with weights ^S'^ := Ki^c {A{Xi, and ivf ^ := ^ . w^/^l 

Corollary 2.6 (Relation to maximum likelihood estimation). Assumption ( [AT) implies that the 
standard local weighted maximum likelihood estimator 

LE) argsup,L(Wf \ 9) with L(Wf, 9) := w[f logp{Y„ 9), 
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where := {w-^ }j, equals the non-adaptive estimator 9- in Definition 



2.5 



Further, it 



follows for the "fitted log-likelihood" with 9 E Q that 

L{W\'\ 9) := L(Wf, 9^""^) - L{Wf\ 9) = xflCC (^f \ 9^ 

Now, we present the (slightly modified) algorithm of the Propagation-Separation approach al- 
lowing T{y) ^ y and omitting the memory step fPolzehl and Spokoiny, |2006| Section 3.2] by 



setting rji = 1. More details can be found in [Polzehl and Spokoiny) |2006[ Section 3] 



Algoritlim 1 (Propagation-Separation approach). 

■ Input parameters: Sequence of bandwidths {/i*^^)}^1q and adaptation bandwidth A. 

■ Initialization: ^ := 9^^ and N^^^ := ivf ^ for all i e {1, ...,n}, k := 1. 
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Iteration; Do for every i = 1, ...,n 



n 



(2.4) Of'^ ^=J2^^T(Y,)/N^ 



with weights wif := K^^c {A{Xi, Xj)/h(^^) ■ [sf^ /X 

where ^ := Nt'^lCC{et'\ Of~'^) and := 

■ Stopping: Stop if = k*, otherwise increase k by 1. 

Remark 2.7 (Choice of the input parameters). 

■ The amount of adaptivity is determined by the adaptation bandwidth A which can be 
specified by the propagation condition independent of the observations at hand, see Sec- 



tions 2.3 and 4.1 and [Polzehl and Spokoiny 2006 Sections 3.4 and 3.5]. The choice A 



oo yields non-adaptive smoothing. 

The initial location bandwidth h^^'> should be sufficiently small in order to avoid smoothing 
among distinct homogeneous compartments, before adaptation starts. In practice, any 
choice of h''^^ such that Uf'^ = {Xi} for every i e {1, n} seems to be recommend- 
able. Its drawback is discussed in Remark |3^ 

The sequence of bandwidth {/i^^'^l^lo can be chosen such that h^'^^ := a^/i.^°^ with 



a ~ 1.25^/'^ if d denotes the dimension of the design space X, see Polzehl and Spokoiny 
(2006', Section 3.4]. Alternatively, we could ensure a constant variance reduction of the 
estimator, see |Becker et al..[2012J . 
■ Note, that the procedure provides an intrinsic stopping criterion yielding a certain stability 
of estimates, see Section [3] and the simulations in Figures [l] and |2] Hence, the maximal 
bandwidth h^''*\ specified by the maximal number of iterations k*, is only bounded by 
the available computation time. 

2.3. Propagation condition. As mentioned above, an appropriate choice of the adaptation 
bandwidth A is crucial for the behavior of the algorithm. [Polzehl and Spokolnyl (2006[ Section 



3.5] suggested a choice, called propagation condition. The basic idea is that the impact of 
the statistical penalty in the adaptive weights should be negligible under homogeneity yielding 
almost free smoothing within homogeneous regions. More precisely, the authors proposed to 
adjust A by Monte-Carlo simulations in accordance with the following criterion, where an artificial 
data set is considered. 

"(...) the parameter A can be selected as the minimal value of A that, in case of a 
homogeneous (parametric) model 6{x) = 6, provides a prescribed probability to 
obtain the global model at the end of the iteration process." 

Here, we formally introduce a new criterion which allows, in the setting of Algorithm [l] the 
verification of propagation and stability under (local) homogeneity. Additionally, it provides a 
better interpretability than earlier formulations, see e.g. , Polzehl et al.|(20To) . 



Under homogeneity, i.e. if 6{.) = 6, (PS^ in Appendix|A]shows that the non-adaptive estimator 

satisfies P (jlf^lCC(ef\ 9) > < 2e-' for all i G {1, n} and every k e {0, k*}. 

—(^k) (fc) 

Hence, KLCiO^ , 6) decreases at least with rate . The following condition ensures a similar 
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behavior for the adaptive estimator. We introduce the function 3a : {0, A;*} x (0, 1) x 6 — )• 
M+ with A > 0, defined as 

3x{k,p; 6) := inf |z > : P {NfKC{ef\\), > z) < , 

where 9f'\\) denotes the adaptive estimator resulting from the Propagation-Separation ap- 
proach with adaptation bandwidth A > and observations Yi ~ for all i G {1, 

\.e.e{.) = e. 

Definition 2.8 (Propagation condition). We say that A is chosen in accordance with the prop- 
agation condition at level e > for G 6 if the function 3a(-,J5; 9) is non-increasing for all 

pe (e,l). 

As before, the propagation condition is formulated w.r.t. some fixed parameter G 0. in prac- 
tice, the parameter function 9{.) is unknown. Hence, we need to ensure that the propagation 
condition is satisfied for all 9i with i G {1, n}. At best, the choice of A by the propagation 



condition is independent of the underlying parameter 9. The study in Section |4j]points out that 
this is the case for Gaussian and exponential distribution and as a consequence for log-normal, 
Rayleigh, Weibull, and Pareto distribution. Else, we recommend to identify some parameter 9* 
yielding a sufficiently large choice of the adaptation bandwidth A such that the propagation 



condition remains valid for all 9i with i G {1, ....n), see Section 4.1 for more details. 
Remark 2.9. 



In Section [4/1] we consider some examples for Gaussian, exponential and Poisson distri- 
bution, see Figures [3] [4] and[5] 



If the function 3a(-7Po,^^), 6* G 6, in Definition 2.8 is non-increasing for some po ^ (0,1) 
then it is non-increasing for all p > by monotonicity. 

The propagation condition yields a lower bound for the choice of A. In general, it is ad- 
vantageous to allow as much adaptation as possible without violating the propagation 
condition. Hence, the optimal choice of A is 

^opti^i 9) := inf {A > : 3a(-, e; 9) is a non-increasing function} . 

In Theorem[l]we need e to be strictly smaller than 1/n. However, this is based on a quite 
rough upper bound. In practice, it seems advantageous to choose e appropriately for the 
respective application. Note, that Aopt(e, 9) increases if e decreases. 

The probability P (Nf^KC{9f\X),9) > z) cannot be calculated exactly. In Section 4.2 



we introduce an appropriate approximation which can be used in practice. 



2.4. Some heuristic observations. In order to provide some intuition, we illustrate the general 
behavior of Algorithm [1] on two examples, see Figures [1] and [2] We apply the R-package aws 
|PolzehH|2012| . Here, the memory step is skipped by default. It can be included setting memory 
= TRUE. 
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Figure 1 . Results of Algorithm [T| (black line) for the piecewise constant parameter func- 
tion 6i{.) (red line) with adaptation bandwidth Ai — 14.6 and location bandwidths (f.l.t.r.) 
hi — 17.1, 52, 947. The green circles represent the Gaussian distributed observations. 

On X := {1, 1000}, the first test function is piecewise constant 

'o, 

2, 



Oiix) 





if X G {1, 


200} 




if a; G {201, 


...,400} 


3, 


if a; G {401, 


...,550} 


2.5, 


if a; G {551, 


...,700} 


2, 


if X G {701, 


...,850} 


2.5, 


if X G {851, 


...,1000} 



and the second one is piecewise polynomial 

{x/300, if X G {1,...,300} 

4 + ((x/100 - 5))V2, if X G {301, 800} 
15-2x/100, if X G {801,. ..,1000}. 

The observations follow a Gaussian distribution, i.e. ~ A/" {6{Xi), 1). 

The plots were provided by the function aws setting hmax := h^'^*^ := 1000 and Ikern = 
"Triangle", such that 

2 



(2.5) K\oc{x) := 1 — and Ksai{x) := min{l,2 — x}+. 

In Figure [l] we show the results for the piecewise constant function with Ai = 14.6 
and increasing location bandwidths hi = 17.1,52,947 corresponding to the iteration steps 
ki = 15, 20, 33. Figure|2]is based on the piecewise smooth function 6*2 (.) setting A2 = 16 and 
h2 = 4.42, 41.6, 947, that is fca = 9, 19, 33. For both examples, it holds k* = 33 representing 
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Figure 2. Results of Algorithm [T] (black line) for the piecewise polynomial parameter func- 
tion 02(-) (red line) with adaptation bandwidth A2 = 16 and location bandwidths (f.l.t.r.) 
= 4.42, 41.6, 947. The green circles correspond to the Gaussian distributed observations. 



the final iteration step. The corresponding mean squared error (MSB) is similar to the MSB in 
step ki = 15 and k2 = 9, respectively. In the steps ki = 20 an k2 = 19 the MSB is minimal. 

We summarize the following heuristic observations. 

■ Homogeneous compartments with sufficiently large discontinuities are separated by the 
algorithm leading to a consistent estimator, see x G {1, ...,400} in FigurejTj 

■ If the discontinuities are too small, separation fails. Then, different homogeneous com- 
partments are treated as one yielding a bounded estimation bias. This is illustrated in the 
right part of Figure[T| where x E {401, 1000}. 

■ In Figure |2] we consider the case of model misspecification, that is a parameter func- 
tion 9{.) that is not piecewise constant. Here, the algorithm forces the final estimator into 
a step function. The step size depends mainly on the smoothness of the parameter func- 
tion 6{.) and the adaptation bandwidth A. However, the estimation bias can be reduced by 
an accurate stopping criterion. The maximal location bandwidth should be chosen 



such that the non-adaptive estimator in Definition |2.5| behaves good within regions with- 
out discontinuities. Then, supposing an appropriate choice of the adaptation bandwidth A, 
within these regions. Algorithm [Ijwould yield similar results as non-adaptive smoothing 
while smoothing among distinct regions would be avoided as sharp discontinuities could 
be detected by the adaptive weights. 

Thus, the heuristic properties are quite clear. However, the iterative approach complicates a 
theoretical verification considerably Therefore, in Section [3] we concentrate on piecewise con- 



stant functions with sharp discontinuities. Here, our new propagation condition, see Section 2.3 
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ensures propagation within homogeneous regions and stability of estimates due to separation 
of distinct compartments. The case of model misspecification will be analyzed in an upcoming 
study. 



3. Theoretical properties 



Now, we analyze the behavior of the algorithm in more detail. First, we consider a homoge- 
neous setting, where propagation and stability of estimates follow as direct consequence of the 
propagation condition. Then, we show the separation property. For locally constant parameter 
functions with sufficiently sharp discontinuities this restricts smoothing to the respective homo- 
geneous regions yielding again propagation and a certain stability of estimates. We assume 
that we have identified A and e such that the propagation condition holds. 



3.1. Propagation and stability under liomogeneity. We show for a homogeneous setting 
that the propagation condition yields with (PS|2j in Appendix |A] an exponential bound for the 

excess probability P [N^^^ lCC{6f\6) > zj of the Kullback-Leibler divergence between the 



adaptive estimator and the true parameter 6. 

Proposition 3.1 (Propagation and stability under homogeneity). Suppose 9{.) = 9, Assump- 



tion lAlf , and let the adaptation bandwidtli A be cliosen in accordance with the propagation 
condition at level tforO G 0. Then, for each i G {1, n}, k G {0, k*}, and all 2 > 0, it 
holds 

(3.1) P (ivf ^/C£ (ef\ e^> < max {2e-^ e} . 
In particular, we get for all k' > k that 

(3.2) P (Nf^lCC (eT'\ e)>z)< max {p (n^'^ICC (^T\ o) > z) ,e} . 



Proof Equation ( |3.2) follows from the propagation condition, which ensures that the func- 
tion 3a(-,P; 0) is non-increasing for all p G (e, 1). Since, see Algorithm [l] we have Of'^ = 9^^ 
this yields 

P (NfjCC (^?\ e) > z) "'P max {p (n^'iCC (ef\ e) > z) , e} 

S max [2e , e| , 

leading to the assertion. □ 



3.2. Separation property. For considerably different parameter values the corresponding adap- 
tive weights become zero, see Proposition |3.3| below. To show this, we need (PS[T| in Appen- 
dix [A] This requires an appropriate choice of the constant x > 0, introduced in Lemma [23] 
The iteration step k G {0, k*} will be specified in each case where the assumption is used. 

Assumption A2 (Choice of x). Let x > be sufficiently large such that the true parameter 
and its estimator satisfy 9i, Of^ G for all i G {1, n}. 
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Remark 3.2. Suppose that x satisfies di G 0-^ for all i G {l,...,n}. Then it holds with high 
probability, for sufficiently large iteration steps k, that 6'-'^^ G 6,^, too. However, in Theorem [l] 
we require Assumption | |A2) for all iteration steps. In order to ensure this, we could increase x 
leading to a larger set 0^, but this would weaken our theoretical results. Instead, we recommend 
a slight modification of the algorithm. We replace Equation \2.4\ by 



9^ 



argmm 



projecting the adaptive estimator into the set 6^. This approach corresponds to Bayesian es- 
timation with a priori knowledge 9i G 6^ for all i G {1, ...,n}. Analogously, we redefine the 
initial estimates via projection of the non-adaptive estimator into Q.^ 



(0) ._ 



argmm 



9' - 9 



(0) 



(0) 



Additionally, it might be advantageous to decrease the probability of 9^ ^ by choosing 
the initial bandwidth /i^^^ such that the neighborhood U-'^^ contains more design points than Xj 
for each i G {1, ...,n}. Else, the projection may change the adaptive weights in later iteration 

steps leading to slightly shifted estimators. On the other hand, initialization with U-'^^ = {Xi} 
avoids smoothing among distinct homogeneous regions before adaptation starts. 



The following proposition is similar to the first part of [Polze hl and Spokoinyl|2006| Theorem 5.9]. 
It implies that different homogeneous compartments with sufficiently large discontinuities will be 
separated by the algorithm. In particular, we will see, that the lower bound for the discontinu- 
ities allowing exact separation of the distinct compartments depends mainly on the adaptation 
bandwidth A and the achieved quality of estimation in the previous iteration step. 



Proposition 3.3 (Separation property). Suppose Assumptions \Al} and, at iteration step k, As- 
sumption (A^. We consider two points Xj^ and Xi^ providing in iteration step k tiie estimation 



accuracy K,C{9f\ 9i 



/ {k) 



z/Nj^ with some constant z > 0, m = 1,2. If 



(3.3) 



\,9i.2) > X 



(fc) 



then it holds w^^^^ 



Proof sketch. Due to the compact support of the adaptation kernel Kad, it suffices to show that 
the statistical penalty introduced in AlgorithmjT satisfies s -^j^^^ > A. (Ps[lj in Appendix A yields 

for K,C{9f\ 9iJ < zi^ with m = 1, 2 that 



/C£V2 ^0ik)^ f ^-1^^1/2 (^^^^ - - 



^2 



such that 



Jk+l) 
''ilk 

by Equation (13^ 



(k) 



K~'^KM9~J,, 



Xk) 



>A, 



□ 
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Remark 3.4. The lower bound ( |3.3| holds if 

max I y/z 



(»,„#i,)>3«. 



This emphasizes the impact of the involved sample sizes. 

3.3. Propagation and stability under local homogeneity. Next, we consider a locally homo- 
geneous setting with sharp discontinuities. In this case, smoothing is restricted to the homoge- 
neous compartments leading to similar results as under homogeneity, that is to propagation and 
to stability of estimates. 

Assumption A3 (Structural assumption). There is a non-trivial partition V := {Vj}j of X into 
maximal homogeneity compartments, i.e. for each Xi E X there are a vicinity C and a 
constant v^j > such that 

= for all Xj e Vi 
Oj) > ipf for all Xj ^ Vi. 

We deduce the propagation property for the present case. Here, we should take into account 

(k) 

that the considered neighborhood U- might be much larger than the respective homogeneity 
compartment Vi. Obviously, the divergence lCC{9i ,9i) cannot converge with rate A^j in this 
case. Therefore, we introduce the notion of the effective sample size n- 

Notation 3.5. We define for each i G {1, n} and k G {0, k*} the effective sample size 
and its local minimum 

(3.4) nf'^ := \^ w[f and nf'^ := min nf\ 




As it turns out, the quantities n] ' determine the minimal stepsizes such that a discontinuity 
will be detected. During the first iteration steps it holds nf^ = Nf \ The quotient nf^/xf^ 
decreases when uf''^ becomes larger than Vj. 

In the following theorem, we consider the event 

B^''\z) := jfif )/C£(^f \ Oi) <z for alu} , z > 



Theorem 1 (Propagation property under local homogeneity). Suppose Assumptions (A1} and 
and, for all iteration steps k < k' with k' G {0, k*} fixed, Assumption Let the 
bandwidth X be chosen in accordance with the propagation condition at level e for all 6i, i G 
{1, ...,n}. If for all i G {!,..., n} and every k < k' the constants (fi > in Assumption A3} 
satisfy 



(3.5) (fi > K 
then 

(3.6) P (b^'''\z)'^ >l-{k' + l) max {2ne-^ ne} 
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Proof. Let M'^ denote the complement of the set M. Then it holds 

(3.7) >l-n-¥ ({nf )/C£ (^f \ > z}n B^''~^\z)'^ - P . 

are satisfied on iS'^'^^^H^;). Therefore, it follows 



Due to (3.5 the conditions of Proposition 



on that wlf 



3.3 



for all Xj ^ U-'^^ fl Vj. Hence, smoothing is restricted to the 



homogeneous compartment Vj and E6'-^-' = 6'j. We get with Proposition 
(3.8) P ({nf ^/C£ (^f \ > 2} n B^''-^\z)) < max {2e- 



3.1 



for all k e {1, k'}. Now, we proceed by induction. Since df*^ 
from (PS [2) in Append ix|A] that 



6',-°'' by Algorithmh lit follows 



P(i3(°)(2)) 



> 



(0) 



1 -n-P 



> z 



(PS|2] 

> 1 



2 ne 



Finally, Equations ( |3.7) and l |3.8| l lead for all k < k' to 

— nmax{2e ^, e} — A; max {2ne ^,ne} 
= 1 — (A; + 1) max {2ne~^, ne} . 

This terminates the proof. 
Remark 3.6. 



□ 



In Equation \3.6\ , we observe an additional factor (fc+1), which appeared in the propaga- 
tion property of |Polzehl and SpokoinyH2006 ] as well, see Equation i |3.10| l in Section [3!4 



below. This factor results from the proof only and might be avoidable. In particular, we 
notice that the given bound is not s harp as we did not take advantage of the intersections 
of the sets [B^'^^ {z)y in Equation JsTj. The above theorem provides a meaningful result 

for z > glog(n) and e := c^n^'' with > and q > 1. 

Separation depends via the statistical penalty on the estimation quality of all data within 
the local neighborhood U- '^\ Therefore, the extension of the smallest homogeneous com- 



(k) 

partment, denoted by n. determines the lower bound 3.5 for the discontinuities that 



provide an exact separatio n of the distinct homogeneous compartments. This bound is 
closely related to Equation < 3.3 ' that involves only two points such that the term 2/^ '■"^^^ 



from Equation ||3.5| can be rep 



aced by 



n 



fik) 



having the same effect. 

Finally, we deduce a similar result as in Equation | |3.2| under local homogeneity. Thus, we infer 
from the estimation quality in iteration step ki on the estimation quality in step k2 > ki. To this 



end, we apply again the separation property, see Proposition |3.3[ This requires sure knowledge 
on the previously achieved estimation quality. Therefore, we consider the conditional probability 
and verify an exponential bound. 
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Proposition 3.7 (Stability of estimates under local homogeneity). In the situation of Tlieorenn\l\ 

itlioldsforallki,k2 G {0,...,k*} witliki < k2 < k' sucli tliat {k2 + l) ma.x{2ne~^,ne} < 1 
tliat 

,3.9) ^(g'^''M|g'^''W)S l-;t: + l)n.a.{2ne-.:n,i 

Proof. The lower bound holds since 

P {B^'Hz)\B^'Hz)) = 1 - p lsLi,)) 

and furthermore 

p((i3('=2)(2))'^ni5('=i)(-2)) 
= p {{B^''^\z)y n n B^'''\z)) 

+p n {B^'''-^\z)y n iS^'^^H-^)) 

< p {{B^''-'\z)y n + P {{B^'''-'\z)y n 

< Yl ^ {{B^^\z)y n B'^'''^\z)) . 

k=ki+l 

Additionally, we know from Equation | |3.8| that 

P {iB^''\z)y n B^''~^\z)) < max {2r^e-^ne} 
for every < k'. Hence, we get from Equation ||3.6| that 



1 "~ (^2 + 1) max {2ne ^, ne} 
1 — {ki + 1) max {2ne~^, ne} 
leading to the assertion. □ 

Remark 3.8. The assumptions on the choices of ki and k2 ensure that the lower bound in Equa- 
tion | |3.9| is larger than zero and smaller than one. This lower bound for the conditional proba- 
bility P(i3(''2)(^)|^(fci)(^)) improves the lower bound F {B^'"'\z)) inTheorem[l] However, 
this result allows a comparison of the established lower bounds only, but not of the exact prob- 
abilities. 



3.4. Relation to previous work. In the original study by |Polzehl and Spokoiny] |200 6|, the 



authors demonstrated propagation, separation and stability of estimates up to some constant. 
We will summarize these results briefly. All associated proofs were based on the memory step. 
In this study, we have shown similar properties for the simplified algorithm, where the memory 
step is removed. However, our results are restricted to locally constant parameter functions with 
sharp discontinuities. Theoretical properties of the algorithm in case of model misspecification 
will be analyzed in an upcoming study. 



Both studies include a certain separation property see |Polzehl and Spokoiny] (20061 Section 



5.5] and Proposition [33j This justifies that in case of sufficiently large discontinuities smoothing 
is restricted to the homogeneity regions. 
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For the propagation property, Polzehl and Spokoiny supposed, among other things, the statisti- 
cal independence of the adaptive weights from the observations. They then showed for 6{.) =9 
that 



(3.10) 



P 



9]<fx\og{n) yi)>l-2k/n, /i > 2, 



where 9l denotes the adaptive estimator after modification by the memory step, see 



Polzehl 



and SpokoinyH2006| Section 3.2 and 3.3]. For locally almost constant parameters they estab- 
lished a similar result. Equation ( |3.10| could be improved by Proposition|3j]taking advantage of 
the new propagation condition introduced in Section 2.3 Setting z := fi log(?2) and e := Cen~'' 



Proposition |3T1] implies 

9^'\9 



< fj,\og{n) Vz )> 1 — max {2/?T,, Ce/n} , fi,q>2, 



p [n^icc 

where the additional factor k is avoided. Theorem [l] sheds light on the interplay of propagation 
and separation during iteration. Here, we do not restrict the analysis to the respective homoge- 
neous compartment as in Proposition |3J] and |Polzehl and Spokoinyl |2006|. Instead, we use 
the separation property to verify the propagation property for piecewise constant functions with 
shar p disc ontinuities. The resulting exponential bound in Equation | |3.6) complies with Equa- 
tion '3.10 setting z> q \og{n) and e := c^n^'^ with q > and q > 2. 



The results on stability of estimates are difficult to compare. Our corresponding results are 



stated in Propositions |3. 1 1 and |3.7| Polzehl and Spokoiny proved under weak assumptions sta- 
bility of estimates up to some constant. More precisely, they showed that 



Nl'^}CC(e\9., 



< /ilog(n) 



implies with probability one 



ivf^/C£ (9^,9,) <clog(n) 



X [yCiCr + y/Jl 



where x is as in Lemma 2.3 



Cr log(n) denotes the bandwidth of the memory kernel and 



Cl 



(fc— 1) (k) 

\fv) depends on the constant v satisfying v\ < N,- /N^ < u with 
1^1, u E (2/3, 1). Hence, the constant c might be quite large. This result allowed to verify under 
smoothness conditions on the parameter function 9{.) the optimal rate of convergence. 



4. Discussion 



In this section, we dwell into the propagation condition, discuss its application in practice and 
generalize the setting of our study. 

4.1 . (In-)dependence of the propagation condition of the parameter. The propagation con- 



dition in Definition |2.8| is formulated w.r.t. the unknown parameter G 0. In this section, we 
evaluate its dependence of this parameter. To this end, we start with a more general problem 
yielding a sufficient criterion. This criterion suggests the independence of the propagation condi- 
tion of the parameter 9 in case of Gaussian and exponential distribution and as a consequence 
of log-normal, Rayleigh, Weibull, and Pareto distribution. Additionally, we discuss the choice of A 
if the associated function 3a is not independent of the paremeter 9, where we concentrate on 
the Poisson distribution. 
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We introduce a general criterion for the independence of the composition of two functions of 
some parameter 6. 

Proposition 4.1. Let f : Vl^ and g : Vl^ — > be continuously differentiable functions 
witti open domains f]^, fi^ C ]R2_ denote := {y : (y, 0) G fi^}, /e : ^ M witin 

feiy) '■= f{y,0), and analogous sndge. Then, we suppose geiVtl) C Vt^ and ^ > 0, 
such that the composition fg o g'g^ : ge{^l) M is well-defined. The function 

h{z,e):=fe{gg\z)), {z,e)eg{n^), 
is independent of 9 if a variable C{y, 9) and functions f andg exist such that 
(4.1) m = feiy) and ~g{C) = gg{y). 

Proof Substitution with y := g^^iz) yields h{ge{y), 9) = f {y, 9) for {y, 9) E and hence 
the total derivatives 

dh dh dg dh df dh dh dg df 



dhdg dh _ ^ ^ 
d9 dz d9 d9 d9 dy dz dy dy ' 



Then, it follows 



This leads with 



dh 
dz 



dge 



and furthermore 

ay I ay 

df dg ^ dh dg df dg 
dy d9 d9 dy d9 dy 



dy 



> Oto 



such that 



dh 
d9 " 

dh 
d9 



d^dg_ _ d^dg_ 
d9 dy dy d9 







d^dg_ 
d9 dy 



dy 



d^dg_ 
dy d9 



The chain rule implies with Equation | |4.1) that indeed 

df dg df dC, dg d( df d^ dg d^ 
d9dy ^ dCd9dCdy ^ dCdydCd9 
yielding that h is independent of 9. 



dj^dg_ 
dy d9 



□ 



Now, we are well prepared to evaluate the (in-)dependence of the propagation condition in Def- 



inition [Z8] and hence of the choice of A, of the parameter 9. The estimator is defined as linear 
combination of the terms T{Yj), where the adaptive and the non-adaptive estimator differ only 
in the definition of the weights. Thus, we approach the problem in three steps. We start from the 
special case, where the estimator is restricted to a single point T{Yj). Then, we consider the 
non-adaptive estimator describing its probability density as convolution of the respective den- 
sities corresponding to the weighted observations. Here, we take advantage of the statistical 

independence of the involved random variables w\-'T{Yj)/N^ . In case of the adaptive esti- 
mator we cannot follow the same approach. This would require knowledge about the probability 



distribution of the random variables wf^^T(Yj)/Nl"'' , where the adaptive weights follow an un- 
known distribution. Further, these variables are not statistically independent. To compensate 
the resulting lack of a theoretical proof, we illustrate by simulations that the adaptive estimator 
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shows almost the same behavior as the non-adaptive estimator, if the propagation condition is 
satisfied. This suggests that the probability distribution of lCC{6f'\ 0) is independent of 6 if the 
same holds true w.r.t. the non-adaptive estimator. The single observation case is treated first. 

Lemma 4.2. Let V = {Pejeee with C R be a parametric family of continuous proba- 
bility distributions. Suppose that F ~ and T{Y) G 6 almost surely and that the den- 
sity fj of Y is continuously differentiable. Consider the random variable Z := geiX) '■= 
KC (PT(y(aj)), Pe) , and assume that ^ 7^ 0. The density // of Z is independent of the 

parameter 9 if a variable ({y, 6) and functions f andg exist such that 

-1 



(4.2) 



/(C) = fliy) 



dge 
dy 



and g{C) = ge{y)- 



Proof. The assertion follows with 

h{z,e):=fi{z) = fj{g-\z)). 



dge 
dy 



as special case of Proposition |Zl|since Pe ( ^(y) > = ^e{T{Y) ^e) = l. 
This Lemma yields the desired results for Gaussian and Gamma-distributed observations 



□ 



Example 4.3. We consider the same setting as in Lemma [4^ In the following cases, the density 
of Z is independent of the parameter 9. 



V = {MiO, o-^)}ggQ with a > fixed: Equation jzsj and Table[l]yield for the Kullback- 
Leibler divergence of Pg, Pe' G V the explicit formula 



ICC {9, 9') 

Since /I' (y) = exp 
by setting 



such that 



dge 
dy 



(y) 



y 



9 



Ciy,9):=y-9, /(C): 



2710-2 ^|.^g independence of 9 from Lemma 



4.2 



ere 



CV2vr 

V = {T{p, 9)},^^ with p > fixed: It holds jj{y) 
ICC {9, 9') = p [9/9' - 1 - In {9/9')] and 



and ^(C) : = 



2a' 



-y/B 



epFip) 



, such that 



dge 
dy 



iy)=p{n- 



Thus, Lemma [4r2| can be applied with 



ay,o):- 



9' 



/(C) 



piC - i)r(p) 



and ^(C)=p[C-l-lnC]. 



This extends to non-adaptive linear combinations as follows. Lemma |^2] can be applied w.r.t. 

—(k) e*''' 
the non-adaptive estimator with Y := 9^ considering the composition of the density fg' 

and the Kullback-Leibler divergence described by the function gg. While the latter depends on 
the assumed parametric family V only, the density fg' is determined via convolution of the 
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probability densities of wfj^T{Yj)/Nf \ where ~ G V. Hence, it depends directly on 
the function T(.) introduced in Assumption ( [ai) . 

Theorem 2. LetV = {Pejeee C M be a parametric family of probability distributions. 
We consider the random variable 

— (k) iid 

where 9^ denotes the non-adaptive estimator depending on the observations Yj ~ Fg with 
j G {1, n} and some 9 E Q. The density of Z is independent of the parameter 9 in the 
following cases. 

mV = {Af{9, a^)}g^Q with a>0 fixed; 

mV = {\ogM{9, cT^)}0ge '^'f^ ^ > f'^^d' 

■ P = {Exp(l/e)},,e; 

■ P = {Rayleigh(e)},g0; 
■ p = {Weibull(^, with k > 0; 
m V = {Paxeto{xm,9)}g^Q withXm > 1. 

Proof The non-adaptive estimator is defined as weighted mean of T(Yj) with j = 1, .., n. We 
get from Table [ijthat 

■ T{Y) = ln(F) ~ ^{fi, a^) if Y ~ \og^{fi, a^); 
U T{Y) = Y^ ^ Exp (^) if 1" ~ Rayleigh(^); 

■ T{Y) = ~ Exp (^) if r ~ Weibull(e, k) with A; > 0; 

■ T(Y) = In (y/x^) ~ Exp (9) if F ~ Pareto(x^, ^). 

Hence, in each of these cases, the non-adaptive estimator follows the same distribution as for 
Gaussian or exponentially distributed observations. Additionally, the corresponding Kullback- 
Leibler divergences coincide with the respective divergences of Gaussian or exponential distri- 
butions. Therefore, it suffices to consider Gaussian and exponential distribution. 

In the Gaussian case, it follows from the statistical independence of the observations Yj ~ 

A/'(^,a2),that 

~ AT {9, a'Q) , where ^-.= ^2 (^f/^f) ' • 

Hence, the non-adaptive estimator is again Gaussian distributed and the independence of 9 
follows analogous to Example|£3] where C and g remain unchanged and 

Next, we consider the exponential distribution supposing Yj ~ Exp(l/6'). We distinguish two 
cases. First, if all non-zero weights are equal, and hence w-^ G {0, 1} as wl^ ^ = 1 for all k, 
then the non-adaptive estimator 9- is Gamma-distributed, i.e. 
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— (fc) 

This yields tlie desired independency of 9 via Example 4.3 setting Y := 9^ . Next, in the 



general case, we require the existence of non-zero weights viy^^ ^ w\y with j,j' G {1, ...,n}. 
If Yj ~ Exp(l/6') then it holds ajYj ~ Exp(l/(^^aj)) for all aj > 0, where we denote 

aj := wf^ /Nf^ for the sake of simplicity. The linear combination Y := aiYi + 02^2 with 
ai 7^ a2 has the density 

f{y) = ir"^' * r""') (y) 

^ 1 K-z \ z 

9ai 9a2 

y 

e *°i"2 dz 



9^aia2 

_ y 

p 6a-i 



9aia2 ( -y^^^ 



9^aia2 0-2 ^ c^i 

1 _JL 1 

/D 6a-i 



y 



»2 



9(ai — 02) 9(ai — 02) 

ai — a2 ai — 02 

which is a weighted sum of the component densities. Therefore, this extends to the more general 
case Y := aiYi + ... + amYm with aj 7^ aj/ for all j 7^ j'. Including the case of equal weights 
aj = ttj' for some G {l,...,n}we conclude that 

where the constants G M depend again on ai,...,a.m only. The densities follow the 
distribution T{mj, 9aj), where nij denotes the number of observations Yy with weights a^/ = 



ttj. Thus, we get from Example 4.3 the independence of 9 for each summand Cjfj yielding the 



assertion for weighted sums of exponentials. □ 



Remark A A. We know from Example 4.3 that the random variable \uj ^ ICC (PT(y(aj)), Pe)] 
is independent of the parameter 9 if the observations follow a Gamma distribution. However, the 
probability distribution of the corresponding non-adaptive estimator has a quite sophisticated 



form I Mathai, ,1 982, |Moschopoulos[ pT 985 1, where the corresponding summands could not been 



proven to be independent of 9. Though, in case of a location kernel that attains only values 
in {0, 1} we get 

r(p, 9) =^ 9^ ~ r(ivf V, O/N^') if wff e {0, 1} for all J. 



This yields via Example |4!3] the independence of 9. The same holds true for the Eriang and 
scaled chi-squared distribution since 

Er\ang{n,l/9) = T{n,9) and Y T{k/2,29/k) \f kY/9 x^ik) = T{k/2,2). 



The new propagation condition is included into the R-package aws |Polzehl[[2012| . First tests 



yield smaller values of the adaptation bandwidth A than the previous version of the propagation 
condition, hence allowing for better smoothing results with a smaller estimation bias. 
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Figure 3. Plots of the propagation condition for the Gaussian distribution with (f.l.t.r.) A = 
22.4, 13.6, 9.72. The isolines of the probability p for values between 10^^ and 0.5 are plotted 
w.r.t. the location bandwidth described by the iteration step k and the corresponding value 
z ~ 3\{k,p; 9 = 1). The black solid lines represent the isolines of the adaptive estimator, the 
red dotted lines correspond to the non-adaptive estimator. 



In Figures [3] and |4] we show some examples to illustrate the close relation of the adaptive and 
the non-adaptive estimator under a satisfied propagation condition. Both Theorem|2]and the nu- 
merical simulations suggest the independence of the propagation condition of the parameter 9. 

The plots have been realized using the function awstestprop on a two-dimensional design 
with 5000 X 5000 points and the same kernels as in Equation | |2.5| |. The maximal location 
bandwidth h^''*^ was set to 50 requiring 38 iteration steps. Running the simulation with different 
parameters 9 yield exactly the same plots. In Figure [s] we show the results for the Gaussian 
distribution with three different values of A. In Figure |4] we consider the same setting w.r.t. the 
exponential distribution. 

Finally, we discuss how to proceed if the function 3a depends on the parameter 9. We want 
to ensure that our choice of the adaptation bandwidth A is in accordance with the propagation 
condition for all 9i, i E {1, ...,n}. Certainly, we do not know the exact parameters {9i}i. In- 
stead, we could analyze the monotonicity of the optimal choice Aopt(e, 9), see Remark 



2.9 



for 

a fixed constant e > and varying parameters 9 E <d. For the sake of simplicity, we prefer to 
observe for a fixed adaptation bandwidth A and varying parameters 9 for which probabilities p 
the propagation condition is satisfied. This can be done by the function awstestprop in the 
R-package aws. Thus, we get for every 9 the corresponding value eA(^^). Then, ex{9) > t\{9') 
indicates that the parameter 9 requires a larger adaptation bandwidth than the parameter 9' . 
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Figure 4. Plots of the propagation condition for the exponential distribution with (f.l.t.r.) A — 13.2, 10.2, 8.78. 

Taking the range of our observations into account, we tempt to identify a finite number of pa- 
rameters 6* e <d sucli that every A that satisfies the propagation condition for these parameters 
6** G remains valid with high probability for the unknown parameters 6i, i G {1, n}. 

For observations following a Poisson distribution it turned out that different parameters 6 yield 
comparable propagation levels ex{9), even though the resulting isolines differ clearly. This is 
illustrated in Figure[5] where we consider the same kernels as in Equation ( |2.5) , a regular design 
with 5000 X 5000 points, and /i*^'^*) = 50, i.e. 38 iteration steps. In case of Bernoulli distributed 
observations it seems to be recommendable to ensure the propagation condition for 9* := 0.5. 
In both cases the implemented algorithm avoids that the Kullback-Leibler divergence becomes 
infinity by slightly shifting the estimator. 



4.2. The propagation condition in practice. The propagation condition is based on the func- 
tion Z\. This depends on the probability P \N^^^lCC{6f\X), 6) > zj which cannot be cal- 



culated exactly. Therefore, in practice, we need an appropriate approximation. This can be 

achieved by the relative frequency of design points Xi e X with Nf^ lCC{6f'\\),6) > z 
as we discuss in Definition |4.5| and Lemma \4~6\ In order to avoid boundary effects, we re- 
strict the approximation to the interior of the design space, that is to all points Xi e X where 
the final neighborhood U-''' ^ is not restricted by the boundaries of the considered compart- 
ment This subset of {Xi}^^-^ is denoted by X^. Without loss of generality we assume 
that = {Xi}"i^i fof" some no < n. 



Definition 4.5 (Approximation). We consider the same setting as in Definition 2.8 and set 

m[^\z) := {X, G A"" : ivf ^/C/:(^f (A), ^) > z}. 
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Then we define the following estimator 
(4.3) 



no 



n, 



i=l 



where 1 denotes the indicator function with 1m{x) = 1 if x G M and luix) = 0, else. 
Lemma 4.6. We consider the same setting as in Definition\2.8\ and suppose the conditions of 



Proposition 3. 1 to be satisfied. Then, it holds for each j G {1, ^o} f^af 

E 



F{N'f'lCC{ef\X),e) > z) < max{2e-^e} 



and 
(4.4) 

Proof It holds 



Var 



< max{2e ^, e}. 



z}\ - ¥[Nficc{ef\\),e)>z 



no 



< n. 



E 





i=l 


< 


max 




i6{l,...,no} 


< 


max 




•ie{l,...,n()} 


PropJaT] 


max{2e~ 



¥(NfK,c{ef\x),e) >z 



01} 



Furthermore, we get 

Var pf\z) 



no 



i=l 
no 

i=l 



L2 



< max Var 1 (fc) (Xi) . 

Obviously, it holds for any random variable X with values in [0, 1] that Var[X] < E[X]. By 
definition of M^\z) this yields 



max E 

«£{!,. ..,no} 



max P(ivf^/C£(^~f (A), 

;{l,...,nn} \ 



> z 



PropJSl] 



ie{i,... ,'«()} 
max{2e ^, e} 

leading to Equation | |4.4| . □ 
Remarl< 4.7. Theorem [ijprovides a meaningful result only if e := Cf^n^'' with > and g > 1. 



We approximate the probability P lNf^}CC{6\'''{X),6) > 2;) by the corresponding relative 
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frequency \4.3\ . This estimate can be calculated for e > 1/n only. Additionally, it becomes in- 
stable if e is close to 1/n. In case of a regular design, the sample can be extended in a natural 
way allowing arbitrary sample sizes and as a consequence any e > 0. Otherwise, that is for 
random or irregular designs, we can achieve e := CeU^'' with q > and g > 1 solely by appli- 
cation of the propagation condition on an artificial data set with m design points, where m,:^ n. 
In this case, one should evaluate carefully under which conditions the propagation condition 
generalizes from the artificial data set to the data set at hand. 

4.3. Generalization of tlie setting. Assumption i |AT) and hence the whole study were re- 
stricted to the case Kg [T{Y)] = 9. Which modifications and additional assumptions are re- 
quired in order to take the previous results over to the case where t{9) := Eg [T(F)] is some 
invertible function? 



As mentioned in Remark 2.2 [T(F)] = 6 for all G can be achieved via reparametriza- 
tion. Estimation of a parameter d with t{'d) := E^ [^'(F)] ^ d can still be done for invertible 

functions setting ^f^ := t'^ief^) for all z G {1, n} and k G {0, k*}, where Of^ 
denotes the adaptive estimator resulting from Algorithm [T| H ence the algorithm remains un- 



modified! We will see that all results in Sections 3 4.1 and |4.2| remain valid if is linear in {} 



This generalizes our previous results to the Gamma, Eriang, Rayleigh, Binomial, and negative 
Binomial distributions, see Appendix[B] 

Assumption A1g (Parametrized exponential family model). = (pW,^ e 9) is an ex- 
ponential family with a compact and convex parameter set 6 and strictly monotone functions 

CuBt G C2(0,M) such that 

Vt{y. ^) := d¥f/d¥{y) = p{y) exp [T(y)a(^) - Bt{i9)] , ?9 G 6, 
where T : 3^ — )► M and p{y) is some non-negative function on y. For the parameter it holds 

lpt{y,mdy) = l and [T(r)] = ^ =: t(^), 

where t : — )■ 6 denotes an invertible and continuously differentiable function. 



Corollary 4.8. Let Assumption ' A1g be satisfied. Reparametrization with 9 := t{'&) yields 

(4.5) ICC{^u^2) = )CC{ei,e2) foralh?i,7?2 e ©• 

If t^-d) is linear in 'd, then it follows for the adaptive estimator -d := t^^ (j)^ that 

(4.6) icjci-d^m) = icc{e,Ee). 

If t{'&) is linear in ^ and if the adaptive estimator of is defined by -^f^^ := t^^{9f^) for all 



i G {1, and k G {0, k*}, then it follows from Corollary 4.8 that all previous results 

remain valid under Assumption i |A1g| , where the formulations of the propagation condition and 
Assumptions jA2| and ( [A3| can be adapted to the generalized setting via = t^^{9). 

The exponential bound (PS [2} is the only result, where we really need that Eg [T(F)] = 6. 
All other proofs could be shown directly, i.e. without reparametrization by = t{d). Here, the 
convexity of the Kullback-Leibler divergence w.r.t. the first argument holds if 

KC {6, 6') = t"{e) [c{e) - c{e')] + t\e)c'{e) > o. 
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Then, the proof of (PS|2]| can be generalized supposing Assumption i |A1g| and 

n n 

D't{P) > J2 ^Mvj) with z/ := Ct{'d), Dt{u) := Bt{9), and ^ -.= ^2 I'^i'^j)] ■ 
j=i j=i 
However, for many parametric families this inequality is violated. That is why we prefer to ap- 
ply (PSjH in its original form, where Eg [T(F)] = 9, and generalize the exponential bound 
afterwards via Equation ( |4.6) . 

5. Conclusion 

This study provides theoretical properties for a simplified version of the Propagation-Separation 
approach, where the memory step is removed from the algorithm. In particular, we have verified 
the following results, which may help for a better understanding of the procedure. 



In Section 2.3 we introduced an advanced parameter choice strategy for the adaptation 



bandwidth A. Its dependence on the unknown parameter function is analyzed in Sec- 



tion |4J] showing for the first time theoretical and numerical results that justify the propa- 
gation condition. 

This parameter choice yields strong results on propagation and stability of estimates for 

piecewise constant functions with sharp discontinuities, see Section [3] 

Finally we gave some more details concerning the application of the propagation condi- 



tion in practice, see Section[42| and a generalization of the assumed setting. Section 4.3 



In Remark 3.2 we proposed a slight modification of the algorithm providing Assump- 



tion ( [A2) on which the results in Section [3] were partially based. 

The behavior of the algorithm and hence the achievable quality of estimation depend mainly 
on the extension of the homogeneous compartments, on the smoothness of the parameter 
function 9{.), and via the adaptation bandwidth A on the parametric family V = {Pejeee of 
probability distributions. Our theoretical results give an intuition of the interplay of propagation 
and separation during iteration. Future research may concentrate on the case of model mis- 
specification in order to justify the heuristic observations in Section |Z!4l mathematically. 

Appendix A. Exponential bound and technical lemma 



We remind of two results which have been proven in jPolzehl and Spokoinyj |2006| Lemma 5.2, 
Theorem 2.1]. 



PS 1 (Technical Lemma). Under Assumption {A1} it holds 

m 



1=1 



for any sequence 9q, 9i, ...,9m G Q^, wliere x > is as in Lemma 2.3 



PS 2 (Exponential bound). If9{.) = 9 and Assumption , A1} is satisfied tinen it hiolds 



¥{NKC{9,9)> z) <2e-\ > 0, 

wfiere N := X]^=i "^i 3rid9 := X]?=i '^j'^O^j) ^'^^ given weigfits Wj E [0,1]. 
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Appendix B. Examples for parametric families 



V, support(/i)) 



e 



p{y) 



T{y) Ct[i)) BtW E^[T{Y)] 



logAA(i9,a2) 
y G (0,oo) 

y G (0,oo) 

1 



y G [0, oo 
Erlang {n, i 

2/ e [0, oo) 
Rayleigh(i9) 

y G [0,oo) 
Weibull(i9, k) 

y G [0,c») 

y G [0,oo) 
Pareto(a;m, "&) 
y e [xm, oo 



(0,oo) 
(0,oo) 
(0,oo) 
(0,oo) 
(0,oo) 
(0,oo) 
(0,oo) 
(0,oo) 
(l,oo) 



V2i 



V2tv 

^{lny)2/(2<T2) 



r(p) 
1 



(n-l)! 

y 

2'=/2r(fe/2) 

1 
y 



Iny 



In 



1 

'2^ 



1 
1 
1 

~^ 
1 

1 

'2^ 



2(72 

In J? 
~2~ 

2(72 

plni? 
In J? 
nlni? 
21nj9 

fclni? 

fclni? 
2 

- In 



pi9 



ni9 



2i92 



Table 1 . One-parametric exponential families which satisfy Assumption i |A1g) : 
Continuous distributions 



V, support(/^) 



e 



p(y) 



[T(y)] 



Poiss(i?) 

y ;= fc G N 

Bin(n,-)?) 

y := fc G {0,1,.. .,n} 
NegativeBin(r, i9) 

y := fc G N 
Bernoulli(i9) 

y :=fcG {0,1} 



(0,oo) 
(0,1] 

(0,1] 

(0,1] 



1/fc! 



fc + r - 1 
A; 



k 

k In 
fe In 



Ini? 

1-1? 
Int? 

1-1? 



i9 

-nln(l - i9) 
-rln(l - 19) 
-ln(l-i9) 



1? 

ni? 

ri9 
1 - I? 

1? 



Table 2. One-parametric exponential families which satisfy Assumption i |A1g) : 
Discrete distributions 
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